multi-player bandit
Distributed Multi-Player Bandits - a Game of Thrones Approach
We consider a multi-armed bandit game where N players compete for K arms for T turns. Each player has different expected rewards for the arms, and the instantaneous rewards are independent and identically distributed. Performance is measured using the expected sum of regrets, compared to the optimal assignment of arms to players. We assume that each player only knows her actions and the reward she received each turn. Players cannot observe the actions of other players, and no communication between players is possible. We present a distributed algorithm and prove that it achieves an expected sum of regrets of near-O\left(\log^{2}T\right). This is the first algorithm to achieve a poly-logarithmic regret in this fully distributed scenario. All other works have assumed that either all players have the same vector of expected rewards or that communication between players is possible.
- Media > Television (0.44)
- Leisure & Entertainment (0.44)
Reviews: Distributed Multi-Player Bandits - a Game of Thrones Approach
This paper studies a distributed multi-armed bandits setting, where every round each of N players must choose one of K arms. If a player picks the same arm as some other player, they receive payoff zero; otherwise, they receive payoff drawn from some distribution (specific to that player and that arm). This can be thought of as learning a distributed allocation or matching from players to arms. The goal is to design a communication-free learning algorithm that maximizes the total overall utility of all the players (or alternatively, minimizes their regret with respect to the best fixed allocation). The authors design an algorithm which receives total regret of O(log 2 T).
- Media > Television (0.42)
- Leisure & Entertainment (0.42)
A survey on multi-player bandits
Boursier, Etienne, Perchet, Vianney
Due mostly to its application to cognitive radio networks, multiplayer bandits gained a lot of interest in the last decade. A considerable progress has been made on its theoretical aspect. However, the current algorithms are far from applicable and many obstacles remain between these theoretical results and a possible implementation of multiplayer bandits algorithms in real cognitive radio networks. This survey contextualizes and organizes the rich multiplayer bandits literature. In light of the existing works, some clear directions for future research appear. We believe that a further study of these different directions might lead to theoretical algorithms adapted to real-world situations.
- Asia > Middle East > Jordan (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Overview (0.48)
- Research Report (0.40)
Multitask Bandit Learning through Heterogeneous Feedback Aggregation
Wang, Zhi, Zhang, Chicheng, Singh, Manish Kumar, Riek, Laurel D., Chaudhuri, Kamalika
Online multi-armed bandit learning has many important real-world applications (see Villar et al., 2015; Shen et al., 2015; Li et al., 2010, for a few examples). In practice, a group of online bandit learning agents are often deployed for similar tasks, and they learn to perform these tasks in similar yet nonidentical environments. For example, a group of assistive healthcare robots may be deployed to provide personalized cognitive training to people with dementia (PwD), e.g., by playing cognitive training games with people (Kubota et al., 2020). Each robot seeks to learn the preferences of its paired PwD so as to recommend tailored health intervention based on how the PwD reacts to and is engaged with the activities (as captured by sensors on the robots) (Kubota et al., 2020). As PwD may have similar preferences and may therefore exhibit similar reactions, one natural question arises--can the robots as a multi-agent system learn to perform their respective tasks faster through collaboration? In this paper, we develop multi-agent bandit learning algorithms where each agent can robustly aggregate data from other agents to better perform its respective task. We generalize the the multi-armed bandit problem (Auer et al., 2002) and formulate the ɛ-Multi-Player Multi-Armed Bandit (ɛ-MPMAB) problem, which models heterogeneous multitask learning in a multi-agent bandit learning setting. In an ɛ-MPMAB problem instance, a set of M players are deployed to perform similar tasks--simultaneously they interact with a set of actions/arms, and for each arm, different players receive feedback from similar but not necessarily identical reward distributions. In the above assistive robotics example, each player corresponds to a robot; each arm corresponds to one of the cognitive activities to choose from; for each player and each arm, there is a separate reward distribution which reflects a PwD's
- North America > United States > Arizona (0.04)
- North America > United States > Virginia > Arlington County > Arlington (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
Distributed Multi-Player Bandits - a Game of Thrones Approach
We consider a multi-armed bandit game where N players compete for K arms for T turns. Each player has different expected rewards for the arms, and the instantaneous rewards are independent and identically distributed. Performance is measured using the expected sum of regrets, compared to the optimal assignment of arms to players. We assume that each player only knows her actions and the reward she received each turn. Players cannot observe the actions of other players, and no communication between players is possible.
- Media > Television (0.40)
- Leisure & Entertainment (0.40)